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SOURCE CODING ENHANCEMENT USING SPECTRAL-BAND REPLICATION 



TECHNICAL. FIELD 

In source coding systems, digital data is compressed before transmission or storage to reduce the required bitrate or 
storing capacity. The present invention relates to a new method and apparatus for the improvement of source coding 
systems by means of Spectral Band Replication (SBR). Substantial bitrate reduction is achieved while n^roaining 
the same perceptual quality or conversely, an improvement in perceptual quality is achieved at a given bitrate This 
is accomplished by means of spectral bandwidth reduction at the encoder side and subsequent spectral band 
replication at the decoder, whereby the invention exploits new concepts of signal redundancy in the spectral domain. 

BACKGROUND OF THE INVENTION 

Audio source coding techniques can be divided into two classes: natural audio coding and speech coding Natural 
audio coding is commonly used for music or arbitrary signals at medium bitrates, and generally offers wide audio 
bandwidth. Speech coders are basically limited to speech reproduction but can on the other hand be used at very low 
borates, albeit with low audio bandwidth. Wideband speech offers a major subjective quality improvement over 
narrow band speech. Increasing the bandwidth not only improves intelligibility and naturalness of speech, but also 
facuitates speaker recognition. Wideband speech coding is thus an important issue in next generation telephone 
systems. Further, due to the tremendous growth of the multimedia field, transmission of music and other non-speech 
signals at high quality over telephone systems is a desirable feature. 

A high-fidelity linear PCM signal is very inefficient in terms of bitrate versus the perceptual entropy The CD 
standard dictates 44. 1 kHz sampling frequency. 16 bits per sample resolution and stereo. This equals a bitrate of 
141 1 kbit/s. To drastically reduce the bitrate, source coding can be performed using split-band perceptual audio 
codecs. These natural audio codecs exploit perceptual irrelevancy and statistical redundancy in the signal Using the 
best codec technology, approximately 90% data reduction can be achieved for a standard CD-format signal with 
practically no perceptible degradation. Very high sound quality in stereo is thus possible at around 96 kbit/s i e a 
compression factor of approximately 15: 1. Some perceptual codecs offer even higher compression ratios To 
aclueve tins, it is common to reduce the sample-rate and thus the audio bandwidth. It is also common to decrease the 
nutnbe, of quantization levels, allowing occasionally audible quantization distortion, and to employ degradation of 
the stereo field, through intensity coding Excessive use of such methods results in annoying perceptual degradation. 
Current codec technology is near saturation and further progress in coding gain is no. expected. In order to improve 
the coding performance further, a new approach is necessary. 

The human voice and most musical instruments generate quasistationary signals that emerge from oscillating 
systems. According to Fourier theory, any periodic signal may be expressed as a sum of sinusoids with the 
frequences/ 2f. 3/ 4/, 5/etc. where/is the fundamental frequency. The frequencies form a harmonic series A 
baadwrdth limitation of such a signal is equivalent to a truncation of the harmonic series. Such a truncation alters the 
o^T ^ T CO,0n^, ° f " mUSiCal inStnUnem ° r ^ - audi ° ^ vvill sound "muffled" 
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Prior art methods are mainly intended for improvement of speech codec performance and in particular intended for 
High Frequency Regeneration (HFR), an issue in speech coding Such methods employ broadband linear frequency 
shifts, non-linearities or aliasing [U.S. Pat 5, 127,054] generating intermodulation products or ther non-harmonic 
frequency components which cause severe dissonance when applied to music signals. Such dissonance is referred to 
in the speech coding literature as "harsh" and "rough" sounding. Other synthetic speech HFR methods generate 
sinusoidal harmonics that are based on fundamental pitch estimation and are thus limited to tonal stationary sounds 
[U.S. Pat 4,771,465]. Such prior art methods, although useful for low-quality speech applications, do not work for 
high quality speech or music signals. A few methods attempt to improve the performance of high quality audio 
source codecs. One uses synthetic noise signals generated at the decoder to substitute noise-like signals in speech or 
music previously discarded by the encoder ["Improving Audio Codecs by Noise Substitution" D. Schultz, JAES, 
Vol. 44, No. 7/8, 1996]. This is performed within an otherwise normally transmittetrhighband at an intermittent 
basis when noise signals are present. Another method recreates some missing highband harmonics that were lost in 
the coding process ["Audio Spectral Coder" AJ.S. Ferreira, AES Preprint 4201, 100* Convention, May 1 1-14 
1996, Copenhagen] and is again dependent on tonal signals and pitch detection. Both methods operate at a low duty- 
cycle basis offering comparatively limited coding or performance gain. 

SUMMARY OF THE INVENTION 

The present invention provides a new method and an apparatus for substantial improvements of digital source 
coding systems and more specifically for the improvements of audio codecs. The objective includes bitrate 
reduction or improved perceptual quality or a combination thereof. The invention is based on new methods 
exploiting harmonic redundancy, offering the possibility to discard passbands of a signal prior to transmission or 
storage. No perceptual degradation is perceived if the decoder performs high quality spectral replication according 
to the invention. The discarded bits represent the coding gain at a fixed perceptual quality. Alternatively, more bits 
can be allocated for encoding of the lowband information at a fixed bitrate, thereby achieving a higher perceptual 
quality. 



The present invention postulates that a truncated harmonic series can be extended based on the direct relation 
between lowband and highband spectral components. This extended series resembles the original in a perceptual 
sense if certain rules are followed: First, the extrapolated spectral components must be harmonically related to the 
truncated harmonic series, in order to avoid dissonance-related artifacts. The present invention uses transposition as 
a means for the spectral replication process, which ensures that this criterion is met It is however not necessary that 
the lowband spectral components form a harmonic series for successful operation, since new replicated components, 
harmonically related to those of the lowband, will not alter the noise-like or transient nature of the signal. A 
tiausposition is defined as a transfer of partials from one position to another on the musical scale while maintaining 
the frequency ratios of the partials. Second, the spectral envelope, Le. the coarse spectral distribution, of the 
replicated highband, must reasonably well resemble that of the original signal The present invention offers two 
modes of operation, SBR-1 and SBR-2, that differ in the way the spectral envelope is adjusted. 

SBR-1, intended for the improvement of intermediate quality codec applications, is a single-ended process which 
relies exclusively on the information contained in a received lowband or lowpass signal at the decoder. The spectral 
envelope of this signal is determined and extrapolated, for instance using polynomials together with a set of rules or 
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a codebook. This information is used to continuously adjust and equalise the replicated highband. The present SBR- 
1 method offers the advantage of post-processing, Le. no modifications are needed at the encoder side. A 
broadcaster will gain in channel utilisation or will be able to offer improved perceptual quality or a combination of 
both. Existing bitstream syntax and standards can be used without modification. 

SBR-2, intended for the improvement of high quality codec applications, is a doublenended process where, in 
addition to the transmitted lowband signal according to SBR-1 , the spectral envelope of the highband is encoded and 
transmitted. Since the variations of the spectral envelope has a much lower rate than the highband signal 
components, only a limited amount of information needs to be transmitted in order to successfully represent the 
spectral envelope. SBR-2 can be used to improve the performance of current codec technologies with no or minor 
modifications of existing syntax or protocols, and as a valuable tool for future codec development 

Both SBR-1 and SBR-2 can be used to replicate smaller passbands of the lowband when such bands are shut down 
by the encoder as stipulated by the psychoacousuc model under bit-starved conditions. This results in improvement 
of the perceptual quality by spectral replication within the lowband in addition to spectral replication outside the 
lowband. Further, SBR-1 and SBR-2 can also be used in codecs employing bitrate scalability, where the perceptual 
quality of the signal at the receiver varies depending on transmission channel conditions. This usually implies 
annoying variations of the audio bandwidth at the receiver. Under such conditions, the SBR methods can be used 
successfully in order to maintain a constantly high bandwidth, again improving the perceptual quality. 

The present invention operates on a continuous basis, replicating any type of signal contents, Le. tonal or non-tonal 
(noise-tike and transient signals). In addition, the present spectral replication method creates a perceptually accurate 
repbea of the discarded bands from available frequency bands at the decoder. Hence, the SBR method offers a 
substantially higher level of coding gain or perceptual quality improvement compared to prior art methods The 
invention can be combined with such prior art codec improvement methods; however, no performance gain is 
expected due to such combinations. 

The SBR-method comprises the following steps: 

- encoding of a signal derived from an original signal, where frequency bands of the signal are discarded and 
the discarding is performed prior to or during encoding, forming a first signal, 

- during or after decoding of the first snjnal, transposing frequency bands of the fust signal, forming a second 
signal, 

- performing spectral envelope adjustment, and 

- combining the decoded signal and the second signal, forming an output signal. 

The passbands of the second signal may be set not to overlap or partly overlap the passbands of the first signal, and 
may be set in dependence of the temporal characteristics of the original signal and/or the first signal, or transmission 
channel conditions. The spectral envelope adjustment is performed based on estimation of the original spectral 
envelope from said first signal r on transmitted envelope information of the original signal 
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Hie present invention includes to basic types of transposes: multiband transposes and time-variant pattern search 
prediction transposes, having different properties. A basic multiband transposition may be performed according to 
the present invention by the following: 

- filtering the signal to be transposed through a set ofN> 2 bandpass filters with passbands comprising the 
frequencies Ifi^.fiA respectively, forming N bandpass signals, 

- shifting the bandpass signals in frequency to regions comprising the frequencies M [f u ^M where M * 1 is 
the transposition factor, and 

- combining the shifted bandpass signals, forming the transposed signal. 

Alternatively, this basic multiband transposition may be performed according to the invention by the following: 

- bandpass filtering the signal to be transposed signal using an analysis filterbank or transform of such a nature 
that real- or complex-valued subband signals of lowpass type are generated, 

- an arbitrary number of channels k of said analysis filterbank or transform are connected to channels Mk y hi* 
1, in a synthesis filterbank or transform, and 

- the transposed signal is formed using the synthesis filterbank or transform. 

An improved multiband transposition according to the invention incorporates phase adjustments, enhancing the 
performance of the basic multiband t rans position. 

Hie time-variant pattern search prediction transposition according to the present invention may be performed by the 
following: 

- performing transient detection on the first signal, 

- determining which segment of the first signal to be used when duplicating/discarding parts of the first signal 
depending on the outcome of the transient detection, 

- adjusting statevector and codebook properties depending on the outcome of the transient detection, and 

- searching for synchronisation points in chosen segment of the first signal, based on the synchronisation point 
found in the previous synchronisation point search. 

The SBR methods and apparatuses according to the present invention offer the following features: 
1 The methods and apparatuses exploit new concepts of signal redundancy in the spectral domain. 

2. The methods and apparatuses are applicable on arbitrary signals. 

3. Each harmonic set is individually created and controlled. 

4. All replicated harmonics are generated in such a manner as to form a continuation of the existing harmonic 
series. 

5. The spectral replication process is based on transposition and creates no or imperceptible artifacts. 

6. The spectral replication can cover multiple smaller bands and/or a wide frequency range. 

7. In the SBR-1 method, the processing is performed at the decoder side only, i.e. all standards and protocols 
can be used without modification. 

8. The SBR-2 method can be implemented in accordance with most standards and protocols with no or minor 
modifications. 

9. The SBR-2 method offers the codec designer a new powerful compression tool. 

10. The coding gain is significant 
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The most attractive application relates to the improvement of various types of low bitrate codecs, such as MPEG 1/2 
Layer WI/III [U.S. PaL 5,040,217], MPEG 2/4 AAC, Dolby AC-2/3, NTT TwinVQ [U.S. Pat 5,684,920], 
AT&T/Lucent PAC etc. The invention is also useful in high-quality speech codecs such as wide-band CELP and 
SB-ADPCM G.722 etc. to improve perceived quality. The above codecs are widely used in multimedia, in the 
telephone industry, on the Internet as well as in professional applications. T-DAB (Terrestrial Digital Audio 
Broadcasting) systems use low bitrate protocols that will gain in channel utilisation by using the present method, or 
improve quality in FM and AM DAB. Satellite S-DAB will gain considerably, due to the excessive system costs 
involved, by using the present method to increase the number of programme channels in the DAB multiplex. 
Furthermore, for the first time, full bandwidth audio real-time streaming over the Internet is achievable using low 
bitrate telephone modems. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will now be described by way of iUustrative examples, not limiung the scope or spirit of the 
invention, with reference to the accompanying drawings, in which. 

Fig. 1 illustrates SBR incorporated in a coding system according to the present invention; 

Fig. 2 illustrates spectral replication of upper harmonics according to the present invention; 

Fig. 3 illustrates spectral replication of inband harmonics according to the present invention; 

Fig. 4 is a block diagram for a time-domain implementation of a transposer according to the present invention; 

Fig. 5 is a flow-chart representing a cycle of operation for the pattern-search prediction transposer according to the 

present invention; 

Fig. 6 is a flow-chart representing the search for synchronisation point according to the present invention; 
Fig. 7a - 7b illustrates the codebook positioning during transients according to the present invention; 
Fig. 8 is a block diagram for an implementation of several time-domain transposes in connection with a suitable 
filterbank, for SBR operation according to the present invention; 

Fig. 9a - 9c are block diagrams representing a device for STFT analysis and synthesis configured for generation 
of 2 nd order harmonics according to the present invention; 

Fig. 10a - 10b are block diagrams of one sub-band with a linear frequency shift in the STFT device according to 
the present invention; 

Fig. 1 1 shows one sub-band using a phase-multiplier according to the present invention; 

Fig. 12 illustrates how 3* order harmonics are generated according to the present invention; 

Fig. 13 illustrates how 2 nd and 3 ri order harmonics are generated simultaneously according to the present 

invention; 

Fig. 14 illustrates generation of a non-overlapping combination of several harmonic orders according to the 
present invention; 

Fig. 15 illustrates generation of an interleaved combination of several harmonic orders according to the present 
invention; 

Fig. 16 illustrates generation of broadband linear frequency shifts; 

Fig. 17 illustrates how sub-harmonics are generated according to the present invention; 

Fig. 18a - 18b are block diagrams of a perceptual codec; 

Fig. 19 shows a basic structure of a maximally decimated filterbank; 
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Fig. 20 illustrates generation of 2** order harmonics in a maximal* decimated filterbank according to the present 
invention; 

Fig. 21 is a block diagram for the improved multiband transposition in a maximally decimated filterbank 
operating on subband signals according to the present invention; 

Fig. 22 is a flowchart representing the improved multiband transposition in a maximally decimated filterbank 
operating on subband signals according to the present invention; 
Fig. 23 illustrates subband samples and scalefactors of a typical codec; 

Fig. 24 illustrates subband samples and envelope information for SBR-2 according to the present invention 
Fig. 25 illustrates hidden transmission of envelope information in SBR-2 according to the present invention- 
Fig 26 illustrates redundancy coding in SBR-2 according to the present invention; 

Fig 27 illustrates an implementation of a codec using the SBR-1 method according to the present invention; and 
F.g 28 rllustrates an implementation of a coded using the SBR-2 method according to the present invention 
Fig 29 is a block diagram of a "pseudo-stereo" generator according to the present invention. 

DESCRIPTION OF PREFERRED EMBODIMENTS 

Throughout the explanation of the embodiments herein, emphasis is given to natural audio source coding 
applications. However, it should be understood that the present invention is applicable on a range of source coding 
applications other than that of encoding and decoding audio signals. 

Transposition basics 

Transposition as defined according to the present invention, is the ideal method for spectral replication, and has 
several major advantages over prior art, such as: no pitch detection is required, equally high performance for smgle- 
puched and polyphonic programme material is obtained, and the transposition works equally well for tonal and non- 
tonal signals. Contrary to other methods, the transposition according to the invention can be used in arbitrary audio 
s urce coding systems for arbitrary signal types. 

An exact transposition a factor A/of a discrete time signal *<„) in the form of a sum of cosines with time varying 
amplitudes, is defined by the relation 

^"> = 2*#(")cos(2^/i//;+a # ) rn 



x=0 



(2) 



where "is the number of sinusoids, hereafter referred to as partial*, A e,<* X a , are the individual input frequencies 
ume envdopes and phase constants respectiveiy, * are the arbitrary output pha* constants and/, is the slpling ' 
&aw>cy,w&0<;juf,sf/t. campling 
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representation Jf(/) is bandlimited to the range 0 tof^ 201. The signal contents in the ranged /A/ to Qf^/M, 
where Q is the desired bandwidth expansion factor 1< Q <LM, is extracted by means of a bandpass filter, forming a 
bandpass signal with spectrum XMf) 203. The bandpass signal is transposed a factor M, f rming a second 
bandpass signal with spectrum**/) covering the range/^ to Qf^, 205. The spectral envelope of this signal is 
adjusted by means of a programme-controlled equaliser, forming a signal with spectrum X^f) 207. This signal is 
then combined with a delayed version of the input signal in order to compensate for the delay imposed by the 
bandpass filter and transpose^ whereby an output signal with spectrum Y(f) covering the range 0 to Qf^ is formed 
209. Alternatively, bandpass filtering may be performed after the transposition M, using cutoff frequencies/™ and 
Qf m . By using multiple transposes, simultaneous generation of different harmonic orders is of course possible. 
The above scheme may also be used to "fill in" stopbands within the input signal, as shown in Fig 3, where the 
input signal has a stopband extending from/ 0 to 06301. A passband V*IM,Ofi>IM\ is then extracted 303, transposed 
a factor M to f/o ,Qf 0 ] 305, envelope adjusted 307 and combined with the delayed input signal forming the output 
signal with spectrum 7(/) 309. 

An approximation of an exact transposition may be used. According to the present invention, the quality of such 
approximations is determined using dissonance theory. A criterion for dissonance is presented by Plomp ["Tonal 
Consonance and Critical Bandwidth" R. Plomp, W. J. M Level! JASA , Vol 38, 1965], and states that two partial 
are considered dissonant if the frequency difference is within approximately 5 to 50% of the bandwidth of the 
critical band in which the partials are situated. For reference, the critical bandwidth for a given frequency can be 
approximated by 

c6(/) = 25+75(l+1.4(-^— ) 2 ) 069 

1000 ( 3 > 

with/and cb in Hz. Further, Plomp states that the human auditory system can not discriminate two partials if they 
differ in frequency by approximately less than five percent of the critical bandwidth in which they are situated. The 
exact transposition in Eq. 2 is approximated by 

n-\ 

W") = £ e '<"> c°s(2>r(A# ± A£)/7 // m m (4) 

where 4/" is the deviation from the exact transposition. If the input partials form a harmonic series, a hypothesis of 
the invention slates that the deviations from the harmonic series of the transposed partials must not exceed five 
percent of the critical bandwidth in which they are situated. This would explain why prior art methods give 
unsatisfactory "harsh" and "rough" results, since broad band linear frequency shifts yields a much larger deviation 
than acceptable. When prior art methods produce more than one partial for only one input partial, the partials must 
nevertheless be within the above stated deviation limit, as to be perceived as one partial. This again explains the 
poor results obtained with prior art methods using nonlinearities etc, since they produce intermodulation partials not 
within the limit of deviation. 



When using the above transposition based method of spectral replication according to the present invention, the 
following important properties are achieved: 
- Normally, no frequency domain overlap occur between replicated harmonics and existing partials. 
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- The replicated partial* are hannonically related to the partials of the input signal and will not give rise to any 
annoying dissonance or artifacts. 

- The spectral envelope of the replicated harmonics forms a smooth continuation of the input signal spectral 
envelope, perceptually matching the original envelope. 

Transposition based on time-variant p attern search p redigion 

Various ways to design the required transpose* exist. Typical time^omain imputations expand the signal in 
Ume by duphcating signal segments based on the pitch-period This signa. is subsequent* read out at a different 
rate. Unfortunately such methods are strictiy dependent on pitch-detection for accurate time splicing of the signal 
segments. Furthermore, the constraint to work on pitch-period based signal segments makes them sensitive to 
transients. Since the detected pitch-period can be much longer man the actual transient, the risk of duphcating the 
enure transit rather than just expanding it in time is obvious. Another type of time domain algorithms obtains time 
^^-Pression , rf speech -^u^^s^^^^^. ^ 
Predion of Speech R. Bogner, T. Li, Proc. ICASSP '89, Vol. 1, May 1989 , "Tu.e-Sca.e Modification of Speech 
b^sedonanonlmearOscillatorModel- G. Kubin, W. B. Kleijn, IEEE, 1994 L Thisisafon„ofgranuWsynthers, 

usuany done by perfornung correlation of signal segments in order to determine the best splicing point, This means 
that thesegments — •"•»fc^«««d^« to p« pBdo4- Z^JiT 
of puch detecuon lS not required. Nevertheless, problems with rapidly changing signal amplitudes remain in these 
methods, and htgh quality transition tends to raise high computational demands. However, an improved time- 
domau, puch shiner/transposcr is now panted, .here me use of transient detection and dynanuc system 
parameters produces a more accurate transposition for high transposidon factors during both stationary (tonal or 
non-tonal) and transient sounds, at a low computational cost 

Referring to the drawings wherein like numerals indicate like elements, mere is shown in Fig 4 nine separate 
modules: a transient-detector 401, a window position adjuster 403, a codebook generator 405, a synchronic 
srgnal selector 407. a synchronisation position memory 409, a minimum ^00^^411 an output 
segment memory 413, a mix unit 415, and a down sampler 417. The input signal is fed to both the codebook 

407, proved « has been connected to another transposes If mis synchronisation position is within the codebook it 
rs used and an output segment is produced. Otherwise the codebook is sent to the minimum difference estimator 41 
winch returns a new synchronisation position. Tbe new output segment is windowed together with the previous 
output segment m the mix module 4 15 and subsequently down sampled in module 417. 

In order to clanfy the explain, a state space representation is unreduced. Here, the state vectors, or granules 
represent the input and output signals. The input signal is represented by a statevector x(n): 

x(n) = [x(»), x<* - D), x{ n - 2D),...,x{n — {N- l)D)] (5) 
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which is obtained from ]sf delayed samples of the input signal, where N is the dimension of the state vector and D is 
the delay between the input samples used to build the vector. Hie granular mapping yields the sample x(n) following 
each statevector x(n-l). This gives Eq. 6, where a(0 is the mapping: 

x(/i) = fl(x(/i-l)). (6) 

In the present method the granular mapping is used to determine the next output based on the former output, using a 
state transition codebook. The codebook of length L is continuously rebuilt containing the statevectors and the next 
sample following each statevector. Each statevector is separated from its neighbour by K samples; this enables the 
system to adjust the time resolution depending on the characteristics of the currently processed signal, where K 
equal to one represents the finest resolution. The input signal segment used to build die codebook is chosen based on 
the position of a possible transient and the synchronisation position in the previous codebook. 

TTus means that the mapping a(.), theoretically, is evaluated for all transitions included in the codebook: 



x(/?-L+AT) 

X(/l-l) 



x(n-Z, + l) 
x(/?-L + A: + l) 

x(n) 



(7) 



With this transition codebook, the new output y(n) is calculated by searching for the statevector in the codebook 
most similar to the current statevector y(n-l). This nearest-neighbour search is done by calculating the minimum 
difference and gives the new output sample: 

M«) = a(y(«-i)). ( 8 ) 

However, the system is not limited to work on a sample by sample basis, but is preferably operated on a segment by 
segment basis. The new output segment is windowed and added, mixed, with the previous output segment, and 
subsequently down sampled. The pitch transposition factor is determined by the ratio of the input segment length 
represented by the codebook and the output segment length read out of the codebook. 

Returning to the drawings, in Fig. 5 and Fig. 6 flowcharts are presented, displaying the cycle of operation of the 
transposer. In 501 the input data is represented, a transient detection 503 is performed on a segment of the input 
signal; the search for transients is performed on a segment length equal to the output segment length. If a transient is 
found 505, the position of the transient is stored 507 and the parameters L (representing the codebook length), AT 
(representing the distance in samples between each statevector), and D (representing the delay between samples in 
each statevector) are adjusted 509. The position of the transient is compared to the position of the previous output 
segment 51 1, in order to determine whether the transient has been processed. If so 513, the position of the codebook 
(window L), and the parameters K y L, and D are adjusted 515. After the necessary parameter adjustments, based on 
the outcome of the transient detection, the search for a new synchronisation, or splicing point takes place 517. This 
procedure is displayed in Fig. 6. First a new synchronisation point is calculated based on the previous 60 1, 
according to: 

Sync _pos = Sync jx>sj>ld + S-M - S 9 (9) 

where Sync^pos and Sync _pos old are the new and old synchronisation positions respectively, S is the length of the 
input segment being processed, and hi is the transposition factor. This synchronisation point is used to compare the 
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accuracy of the new splicing point with the accuracy of the old splicing point 603. If the match is as good i 
better than the previous 605, this new synchronisation point is returned 607 provided it is within the codebook. If 
not, a new synchronisation point is searched for in the loop 609. This is performed with a similarity measure, in this 
case a minimum difference function 611, however, it is also possible to use correlation in the time- or frequency- 
domain. If the position yields a better match than that of the previous position found 6 13 the synchronisation 
IK«tfonisstored615.Whenallix^^ 5 
synchronisation point obtained is stored 519 and a new segment is read out from the codebook 521 starting at the 
given synchronisation point. This segment is windowed and added to the previous 523, down sampled by the 
transposition factor 525, and stored in the output buffer 527. 

In Fig. 7 the behaviour of the system under transient conditions regaining the position of the codebook is illustrated 
Prior to the transient, the codebook 1 representing the input segment I is positioned "to the left" of segment 1 
Correlation segment 1 represents a part of the previous output and is used to find synchronisation point 1 in 
codebook 1. When the transient is detected, and the point of the transient is processed, the codebook is moved 
accenting to Fig. 7a and is stationary until the input segment currently being processed is once again "to the right" 
of the codebook. This makes it impossible to duplicate the transient since the system is not allowed to search for 
synchronisation points prior to the transient. 

Most pitch transposers; or time expanders, based on pattern search prediction give satisfactory results for speech and 
single-pnched material. However, their performance deteriorates rapidly for high complexity signals, like music, in 
particular at large transposition factors. The present invention offers several solutions for improved performance 
therefore producing excellent results for any type of signal. Contrary to other designs, the system is time-variant and 
the system parameters are based on the properties of the input signal, and die parameters used during the previous 
operation cycle. The use of a transient detector controlling not only the codebook size and position, but also the 
properties of the statevectors included, is a very robust and computationally efficient method to avoid audible 
degradation during rapidly changing signal segments. Furthermore, alteration of the length of the signal segment 
bemg processed, which would raise higher computational demands, is not required. Also, the present invention 
utilises a refined codebook search based on the results from the preceding search. This means that contrary to an 
ordinary correlation of two signal segments, as is usually done in time-domain systems based on pattern search 
prediction, the most likely synchronisation positions are tried first instead of trying all positions consecutively This 
new method for reducing the codebook search drastically reduces the computational complexity of the system. 
Further, when using several transposers, synchronisation position information can be shared among the transposers 
for further reduction of the computational complexity, as shown in the following implementation. 

The time-domain transposers as explained above are used to implement the SBR-1 and SBR-2 systems according to 
the following, illustrative but not limiting, example. In Fig. 8 three time expansion modules are used in order to 
generate second, third and fourth order harmonics. Since, in this example, each time domain expansion Aransposer 
works n a wideband signal, it is beneficial to adjust the spectral envelope of the source frequency range prior to 
transposition, considering that there will be no means to do so after the transpositions, without adding a separate 
equaliser system. The spectral envelope adjusters, 801, 803 and 805, each work on several fdterbank channels The 
gam of each channel in the envelope adjusters must be set so that the sum, 813, 815, 817, at the output, after 
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transposition, yields the desired spectral envelope. The transposes 807, 809 and 81 1 are interconnected in order to 
share synchronisation position information. This is based on the feet that under certain conditions, a high correlation 
will occur between the synchronisation positions found in the codebook during correlation in the separate 
transposing units. Assume, as an example and again not limiting the scope of the invention, the fourth order 
harmonic transposer works on a time frame basis half of that of the second order harmonic transposer but at twice 
the duty cycle. Assume further, that the codebooks used for the two expanders are the same and that the 
synchronisation positions of the two time-domain expanders are labelled sync _pos4 and sync_pos2, respectively. 
This yields the following relation: 

sync _ posl = sync _ posA - n -4S- sync _ offset , for n= 1 ,2,3 ,4 . . . , ( i o) 

where 

sync _off set = sync posA - sync _ posl , for n=0, 

and S is the length of the input segment represented by the codebook. This is valid as long as neither of the 
synchronisation position pointers reaches the end of the codebook. During normal operation n is increased by one 
for each time-frame processed by the second order liarrnonic transposer, and when the codebook end inevitably is 
reached, by either of the pointers, the counter n is set to n=0, and sync _pos2 and sync_posA are computed 
individually. Similar results are obtained for the third order harmonic transposer when connected to the fourth order 
harmonic transposer. 

The above-presented use of several interconnected time-domain transposes, for the creation of higher order 
harmonics, introduces substantial computational reduction Furthermore, the proposed use of time-domain 
transposers in connection with a suitable filterbank, presents the opportunity to adjust the envelope of the created 
spectrum while maintaining the simplicity and low computational cost of a time domain transposer, since these, 
more or less, may be implemented using fixed point arithmetic and solely additive/subtracUve-operations. 

Other, illustrative but not limiting, examples of the present invention are: 

- the use of a time domain transposer within each subband in a subband filter bank, thus reducing the signal 
complexity for each transposer. 

- the use of a time domain transposer in combination with a frequency domain transposer, thus enabling the 
system to use different methods for transposition depending on the characteristics of the input signal being 
processed. 

- the use of a time domain transposer in a wideband speech codec, operating on for instance the residual signal 
obtained after linear prediction. 

It should be recognised that the method outlined above may be advantageously used for timescale modification only 
by simply omitting the sample rate conversion Further it is understood, that although the outlined method focuses 
on pitch transposing to a higher pitch, i.e. time expansion, the same principles apply when transposing to a lower 
pitch, i.e. time compression, as is obvious to those skilled in the art 
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Filter bank based transposition 

Various new and innovative filter bank based transposition techniques will now be described The signal to be 
transposed is divided into a series of BP- or subband signals. The subband signals are then transposed, exact or 
approximately, which is advantageously accomplished by a reconnection of analysis- and synthesis subbands, 
heremafter referred to as a "patch" The method is first demonstrated using a Short Time Fourier Transform, STFT. 

The N-point STFT of a discrete-time signal x(n) is defined by 



(12) 

where k = 0.1.....AM and to, = 2**Wand /,<„) is a window. If the window satisfies the following conditions 

\h{n) = 0 for n=±N,±2N,±3N,... < 13 > 
an inverse transform exists and is given by 

(14) 

The direct transform may be interpreted as an analyser, see Fig. 9a, consisting of a bank of tfBP-nhers with impulse 
responses /,(„)expC^„) 901 followed by a bank of multipliers with carriers exp^,,) 903 which shift the BP- 
signals down to regions around 0 Hz, forming thetfanatysis signals^ The window acts as a prototype LP-fiher 
Ai(n) have small bandwidths and are normally downsampled 905. Eq. 12 need thus only be evaluated at » = rR 
where R rs the decimation factor and r is the new time variable. X*n) can be recovered from X^rR) by upsampiing, 
see Fig 9b, i.e. insertion of zeros 907 followed by LP-filtering 909. Tire inverse transform may be interpreted as a 
synthesiser consisting of a bank of* multipliers with carriers (1/A^xp(,<*«) 91 1 that shift the signals^) up to 
^original frequencies, followed by stages 913, Fig. 9c, that add the contributions^) ^ „, channel , ^ 
STFT and I STFT may be rearranged in order to use the DFT and IDFT, which makes the use of FFT algorithms 
possrble ["Implementation of the Phase Vocoder using the Fast Fourier Transform" M. R. Portnoff IEEE ASSP 
Vol. 24, No. 3, 1976]. ' 

Fig. 9c shows a patch 915 for generation of second harmonics, J#- 2, with N= 32. For the sake of simplicity only 
channels 0 through 16 are shown. The centre frequency of BP 16 equals the Nyqvist frequency, channel 17 through 
31 correspond to negative frequencies. The blocks denoted P 9 17 and the gain blocks 9 1 9 will be described later and 
should presently be considered shorted out The input signal is in this example bandlimited so that only channels 0 
through 7 contain signals. Analyser channels 8 through 16 are thus empty and need not be mapped to the 
synthesiser Analyser channels 0 through 7 are connected to synthesiser channels 0 through 7, corresponding to an 
T* Slgn21 ^ Patk channels * where 4 * * s 7 are also connected to synthesis channel Mk M=2 

whrch shift the signals to frequency regions at two times the centre-frequencies of BP filters k. Hence the signals 
are upsfufted to their original ranges as well as transposed ne octave up. To explore the harmonic generation in 
terms of real-valued filter responses and modulators the negative frequencies must also be considered, see the lower 
branch of F,g. 10a. Hence, the combined output of the remapping * -+ Mk 1001 and N-k N-Mk 1003 where 4< k 
^7 must be evaluated ~ 
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This yields 

}< n ) = 7^N n ) * *(«)cos(o> it /i)Jcos((A/ - l)a> k n)) + 

2 r , 
- — W») * /r(»)an(Q» t «)Jsin((A/ - l)a» t /i) 

where At" = 2. Eq. 15 may be interpreted as a BP-flltering of the input signal, followed by a linear frequency shift or 
Upper Side Band (USB) modulation, i.c single side band modulation using the upper side band, see Fig. 10b, where 
5 1005 and 1007 form a Hilbert transformer, 1009 and 1011 are multipliers with cosine and sine carriers and 1013 is a 
difference stage which selects the upper sideband. Clearly, such a multiband BP and SSB method may be 
implemented explicitly, i.e. without filterbank patching, in the time or frequency domain, allowing arbitrary 
selection of individual passbands and oscillator frequencies. 

10 According to Eq. 15, a sinusoid with the frequency c% within the passband of analysis channel k yields a harmonic at 
the frequency AfayKfi* • Hence the method, referred to as basic multiband transposition, only generates exact 
harmonics for input signals with frequencies g* = where $<,k<,l. However, if the number of filters is 
sufficiently large, the deviation from an exact transposition is negligible, see Eq. 4. Further, the transposition is 
made exact for quasi-stationary tonal signals of arbitrary frequencies by inserting the blocks denoted P 917 (Fig. 9c), 

15 provided every analysis channel contains maximum one partial. In this case W?) are complex exponentials with 
frequencies equal to the differences between the partial frequencies a* and the centre frequencies a* of the analysis 
filters. To obtain the exact transposition M, these frequencies must be increased by a factor M, modifying the above 
frequency relationship to w, -> Mat+MCa, - at) = Ma*. The frequencies of XfrR) are equal to the time derivatives 
of their respective unwrapped phase angles and may be estimated using first order differences of successive phase 

20 angles. The frequency estimates are multiplied by M and synthesis phase angles are calculated using those new 

frequencies. However, the same result, aside from a phase constant, is obtained in an simplified way by multiplying 
the analysis arguments by M directly, eliminating the need for frequency estimation. This is described in Fig. 1 1, 
representing the blocks 917. Thus***), where 4 5 * 5 7 in this example, are converted from rectangular to polar 
coordinates, illustrated by the blocks R -> P, 1 101. The arguments are multiplied by M = 2 1 103 and the magnitudes 

25 are unaltered. The signals are then converted back to rectangular coordinates (P->R) 1105 forming the signals 
JWrR) and fed to synthesiser channels according to Fig. 9c. This improved multiband transposition method thus 
has two stages: The patch provides a coarse transposition, as in the basic method, and the phase-multipliers provide 
fine frequency corrections. The above multiband transposition methods differ from traditional pitch shifting 
techniques using the STFT, where lookup-table oscillators are used for the synthesis or, when the ISTFT is used for 

30 die synthesis the signal is time-stretched and decimated, i.e. no patch is used. 

The harmonic patch of Fig. 9c is easily modified for other transposition factors than two. Fig. 12 shows a patch 1203 
for generation of 3 rf order harmonics, where 1201 are the analysis channels and 1205 are the synthesis channels. 
Different harmonic orders may be created simultaneously as shown in Fig. 13, where 2 nd and 3 rd order harmonics are 
used. Fig. 14 illustrates a non-overlapping combination of 2 nd , 3 rd and 4 th order harmonics. The lowest possible 
harmonic number is used as high in frequency as possible. Above the upper limit of the destination range of 
harmonicM harmonic A/+1 is used. Fig. 15 demonstrates a method of mapping all synthesiser channels (//= 64, 
channels 0-32 shown). All highband channels with non prime-number indices are mapped according to the 
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following relation between source and destination channel number: = Mkmm where Mis±e sma||est mtegef ^ 
that satisfies the condition that kmm lies in the lowband and kjat in the highband. Hence, no synthesiser channel 
rcccves signal from more than one analysis channel. Prime-number highband channels may be mapped to yt_„ = 1 
or lowband channels k saim > 1 mat yield good approximations of the above relation (Only non-prime number 
connections with M= 2, 3, 4, 5 are shown in Fig. 15). 

Itis also possible to combine amplitude and phase infonnation from different analyser channels. The amplitude 
signals 1^)1 may be connected according to Fig. 16, whereas the phase signals argf^)} are connected 
according to the principle of Fig. 16. In this way the lowband frequencies will still be transposed, whereby a 
periodic repetition of the source region envelope is generated instead of the stretched envelope that results from a 
transposiuon according to Eq. 2. Gating or other means may be incorporated in order to avoid amplification of 
"empty" source channels. Fig. 17 illustrates another application, the generat.on of sub-harmonics to a highpass 
filtered or bass limited signal by using connections from higher to lower subbands. When using the above 
transpositions it may be beneficial to employ adaptive switching of patch based on the characteristics of the signal. 

In the above description it was assumed that tire highest frequency contained in the input signal was significantly 
lower than the Nyqvist frequency. Thus, it was possible to perform a bandwidth expansion without an increase in 
sample rate. This is however not always the case, why a preceding upsampling may be necessary. When using filter 
bank methods for transposition, it is possible to integrate upsampling in the process. 

Most perceptual codecs employ maximally decimated filter banks in the time to frequency mapping rintroduction 
to Perceptual Coding" K. Brandenburg, AES, Collected Papers on Digital Audio Bitrate Reduction 1996] Fig. 18a 
shows the basic structure of a perceptual encoder system. The analysis filter bank 1801 splits the input signal into 
several subband signals. The subband samples are individually quantised 1 803, using a reduced number of bits 
where the number of quantization levels are determined from a perceptual model 1807 which estimates the 
muumum masking threshold. The subband samples are normalised, coded with optional redundancy coding methods 
and combmed with side information consisting of the normalisation factors, bit-allocation infonnation and other 
codec specific date 1805, to form the serial bit stream. The bit stream is then stored or transmitted In the decoder 
Fig. 18b, the coded bitstream is demultiplexed 1809, decoded and the subband samples are re-quantised to the equal 
number ofb.ts 1811. A synthesis filter bank combines the subband samples in order to recreate the original signal 
1813. Implementations using maximally decimated filter banks will drastically reduce computational costs In the 
fo.low.ng descriptions, there is a focus on cosine modulated filter banks. It should be appreciated however that the 
invenuon can be implemented using other types of filter banks or transforms, including filter bank interpretations of 
the wavelet transform, other non-equal bandwidth filter banks or transfonns and multi-dimensional filter banks or 
transforms. 



In the .Uustrative, but not limiting, descriptions below it is assumed that an L-channel cosine modulated filter bank 
splits the input signal *(„) into L subband signals. The generic structure of a maximally decimated filter bank is 
shown inF.g 19. The analysis filters are denoted^) 1901, where^O, 1,..., L-l. Tbe subband signals v&> are 
maximally decimated 1903, each of sampling frequency/^, where/, is the sampling frequency of «„) The 
synthesis section reassembles the subband signals after interpolation 1905 and filtering 1907 to produce **) The 
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synthesis filters are denoted F&). In addition, the present invention performs a spectral replication on x (») , giving 
an enhanced signal X")- 

Synthesising the subband signals with a OA-channel filter bank, where only the L lowband channels are used and the 
bandwidth expansion factor Q is chosen so that QL is an integer value, will result in an output bit stream with 
sampling frequency Qf t . Hence, the extended filter bank will act as if it is an /.-channel filter bank followed by an 
upsampler. Since, in this case, the L(Q-1) highband filters are unused (fed with zeros), the audio bandwidth will not 
change - the filter bank will merely reconstruct an upsampled version of r(n) . If, however, the L subband signals 
are patched to the highband filters, the bandwidth of xt» will be increased by a factor Q, producing y(ri). This is die 
maximally decimated filter bank version of the basic multiband transposer, according to the invention. Using this 
scheme, the upsampling process is integrated in the synthesis filtering as explained earlier. It should be noted that 
any size of the synthesis filter bank may be used, resulting in different sample-rates of the output signal, and hence 
different bandwidth expansion factors. Performing spectral replication on i(n) according to the present invention of 
the basic multiband transposition method with an integer transposition factor A/, is accomplished by patching the 
15 subband signals as 
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20 



25 



"Aft' 



k (») = ^(«)(-D (M - ,) *"v t (n), (16) 

where k e [0.Z.-1] and chosen so that Mk e [L,QL-\], e^n) is the envelope correction and (-1)*"*- is a correction 
factor for spectral inverted subbands. Spectral inversion results from decimation of subband signals, and the inverted 
signals may be reinverted by changing sign on every second sample in those channels. Referring to Fig 20, consider 
an 16-channel synthesis filter bank, patched 2009 for a transposition factor A/ = 2, with Q = 2. The blocks 2001 and 
2003 denote the analysis filters H&) and the decimators of Fig 19 respectively. Similarly, 2005 and 2007 are the 
interpolators and synthesis filters F&). Eq. 16 then simplifies to patching of the four upper frequency subband 
signals of the received data into every second of the eight uppermost channels in the synthesis filter bank. Due to 
spectral inversion, every second patched subband signal must be frequency inverted before the synthesis. 
Additionally, me magnitudes of the patched signals must be adjusted 20 11 according to the principles of SBR-1 or 



SBR-2. 



Using the basic multiband transposition method according to the present invention, the generated harmonics are in 
general not exact multiples of the fundamentals. All frequencies but the lowest in every subband differs in some 

30 extent from an exact transposition. Further, the replicated spectrum contains zeros since the target interval covers a 
wider frequency range than the source interval Moreover, the alias cancellation properties of the cosine modulated 
filter bank vanishes, since the subband signals are separated in frequency in the target interval. That is, neighbouring 
subband signals do not overlap in the high-band area. However, aliasing reduction methods, known by those skilled 
in the art, may be used to reduce this type of artifacts. Advantages of this transposition method are ease of 

3 5 implementation, and the very low computational cost 

To achieve perfect transposition of sinusoids, an effective maximally decimated filter bank solution of the improved 
multiband transposition method is now presented The system uses an additional modified analysis filter bank, while 
the synthesis filter bank is cosine modulated as described by Vaidyanathan [« Multirate Systems and Filter Banks" 
40 P. P. Vaidyanathan, Prentice Hall, Englewood Cliffs, New Jersey, 1993, ISBN 0-13-605718-7]. The steps for 



WO 98/57436 



16 



PCT/IB98/00893 



operation, using the improved multiband transposition method according to the present invention, based on 
maximally decimated filter banks, are shown schematically in Fig. 21 and in die flowchart of Fig. 22 and are as 
follows: 



1. 



7 



TTie L received subband signals are synthesised with a Channel filter bank 2101, 2201, 2203, where the 
UQ-l) upper channels are fed with zeros, to form signal *,(„), which is thus oversampled by the bandwidth 
expansion factor Q. 

2. x,(«) is downsampled by a factor O, to form signal x 2 (n') 2103, 2205, i. e. x 2 («-) = x x (&f). 

3. Aninteger-valueJTisc^ T-KkOQ is an integer 
wbereristhesizeofthemodifiedar^^ mi jf ' 
should preferably be chosen large for stationary (tonal) signals, and smaller for dynamic (transient/signals 

4. * 2 („') is filtered through a r-channel modified analysis filter bank 2107, 2213, where the ^analysis filters are 
exponentially modulated, producing a set of complex-valued subband signals. The subband signals are 

downsampled by a factor TIM, giving subband signals v/V'), * = 0, 1 T-l. Hence, the filter bank will 

be oversampled by a factor A/. 

5. The samples v/>") are converted to a polar representation (magnitude and phase-angle). The phase-angles 
are multiplied by the factor M and the samples are converted back to a rectangular representation according 
to the scheme of Fig. 1 1. The real parts of the complex-valued samples are taken, giving the signals s k ™(n») 
2109, 2215. After this operation, the signals s k ™(n") are critically sampled. 

6. Ttegaiiusofthesignals*/^ 221? 

The subband signals *«">(„»), where k e [TIM. min(£7)-lj, are synthesised with an ordinary cosine 

modulated ^-channel fUterbank - where channels 0 through TIM-l are fed with zeros 2105, 2221. This 
produces the signal x^i/i). 

8. x 3 >) is finally added to *,(«) to give**) 2223, which is the desired spectral replicated signal. 

Steps 3 to 6 may be repeated for different values of the transposition factory thus adding multiple harmonics to 
xm Tms mode of operation is illustrated by the dotted figures of Fig. 21, and in Fig. 22. by iterating the loop over 
boxes 221 1 - 2219. Inthis case, K is chosen as to make T integer-valued for all choices of M- for integer valued 

:S> P referabl y ***** * as to make KJQ a positive integer. All subband signals s<"'V) , where , = 1, 2 m and 
m is the number of transposition factors, are added according to ' "* 

*oo-£tf«V> (1V) 

for every applicable *. In the fir* iteration of the loop of Fig. 22, the signals «n may be considered to be subband 
samples of zeros only, where * = 0, 1 ... ..A'-l . In every loop> ^ new m ^ ^ ^ 

V), (18) 

where * = KJQ, KJQ+l nun^^L The subband signals *„••) are synthesised once with a ^hanne. fdter 

bank according to step 7. 
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The modified analysis filter bank of step 4, is derived through the theory of cosine modulated filter banks, where the 
modulated lapped transform (MLT) ["Lapped Transforms f r Efficient Transfonn/Subband Coding" H. S. Malvar, 
IEEE Trans ASSP, vol. 38, no. 6, 1990] is a special case. The impulse responses h£n) of the filters in a T-channel 
cosine modulated filter bank may be written 

K (») = C p 0 (*) cosjj^ (2* + !)(« ~^~) + O k J , (19) 

where *= 0, 1 7-1, N is the length of the lowpass prototype filter Po (n), C is a constant and <D, is a phase-angle 

that ensures alias cancellation between adjacent channels. The constraints on <D* is 

°o - ±f » <*>r-. = ±f and <D t = d> t _, ±| (20a<) 
which may be simplified to the closed form expression 

•*«Wf. (21) 

With this choice of 0* , perfect reconstruction systems or approximate reconstruction systems (pseudo QMF 
systems) may be obtained using synthesis filter banks with impulse responses as 

fk («) = C p 0 (n) cos^ (2* + 1)(« - ^1) - <t>, j . ( 22) 

Consider the filters 

h' k (n) = C p 0 («) smj^; (2* + 1)(* - + j , (2 3) 

where *'*(») are sine-modulated versions of the prototype filter p^n). The filters ff^z) and H k (z) have identical 
passband supports, but the phase responses differ. The passbands of the filters are actually Hilbert transforms of 
each other (this is not valid for frequencies close to o= 0 and a> = n). Combining Eq. 19 and Eq. 23 according to 

K <«) = h k (n) +jh' k (n) = C Po (/,) exp|^ {2k + 1)(„ - ~1) + j4> t J (24) 

yields filters that have the same shape of the magnitude responses as H^z) for positive frequencies but are zero for 
negative frequencies. Using a filter bank with impulse responses as in Eq. 24 gives a set of subband signals that may 
be interpreted as the analytic (complex) signals corresponding to the subband signals obtained from a filter bank 
with impulse responses as in Eq. 19. Analytic signals are suitable for manipulation, since the complex-valued 
samples may be written in a polar form, that is */,) = K«)+y »(») = Kn)|exp(/ arg W „))>. However, when using the 
complex filter bank for transposition, the constraint on <fc has to be generalised to retain the alias cancellation 
property. The new constraint on <fc , to ensure alias cancellation in combination with a synthesis filter bank with 
impulse responses as in Eq. 22 is 

<D t =±(-l) t -2_ n ,. 

which simplifies to Eq. 2 1 when M = 1 . With this choice, transposed partials will have the same relative phases as 
they would have when M = 1 (no transposition). 
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Combining Eq. 24 and Eq. 25 results in 

which are the filters used in the modified filter bank of step 4, according to the present inventioa 

Some clarifications concerning step 5: downsampling the complex-valued subband signals by a factor T/M makes 
them oversampled by M which is an essentia, criterion when the phase-ang.es subsequently are multiplied by the 
tnmsposition factor^. The oversampling forces the number of subband samples per bandwidth, after transposition 
to the target range, to equal that of the source range. The individual bandwidths of the transposed subband signals 
are A/tunes greater than those in the source range, due to the phase-multiplier. This makes the subband signals 



critically sampled after step 5, and additionally, there will be 



no zeros in the spectrum when transposing tonal 



In order to avoid trigonometric calculations, that is, having to compute the new subband 



signals as 



4 A/ >(„-) = real 



jM arctan 



imag{v^ ) ( w ')> 



I real^f'V)) 



4<">(»-)|cos 



M arctan 



reaKv^)^-)} J 



(27) 



where |v t <"V)| is the absolute value of *«VX the following trigonometric relationship is used: 

cos(Ma) = cos^ (a) - )sin 2 (a)cos^-2(a) + (^)sin 4 (a)cos^^( ff ) - ... . (28) 

Letting 



r = arctJi^rLlV)>) 
\ reaKv<^(n')> )' 



(29) 



and noting that 



cos(or) = cos(arcta 



fimag{vj^O}\ = real{v^>(n-)} 
1 reaKv^V)} J |v<"V)| 



(30) 



and 



sin(a) = ^-.J^^Wl _ inragfr^V)} 

the computations of step 5 may be accomplished without trigonometric calculations, reducing 
complexity. 



(31) 
computational 
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When using transpositions where M is even, obstacles with the phase-multiplier may arise, depending n the 
characteristics of the lowpass prototype filter p<£n). All applicable prototype filters have zeros on the unit circle in 
the z-plane. A zero on the unit circle imposes a 1 80° shift in the phase response of the filter. For M evert, the phase- 
multiplier translates these shifts to 360° shifts; i.e. the phase-shifts vanish The partials so located in frequency that 
such phase-shifts vanish will give rise to aliasing in the synthesised signal. The worst case scenario is when a partial 
is located at a point in frequency corresponding to the top of the first side lobe of an analysis filter. Depending on 
the rejection of this lobe in the magnitude response, the aliasing will be more or less audible. As an example, the 
first side lobe of the prototype filter used in the ISO/MPEG layer 1 and 2 standard is rejected 96 dB, while the 
rejection is only 23 dB for the first side lobe of the sine-window used in the MDCT scheme of the ISO/MPEG layer 
3 standard. It is clear, that this type of aliasing, using the sine-window, will be audible. A solution to this problem 
will be presented, and is referred to as relative phase locking. 

The filters /»%(«) all liave linear phase responses. The phase-angles <P k introduce relative phase differences between 
adjacent channels, and the zeros on the unit circle introduce 180° phase-shifts at locations in frequency that may 
differ between channels. By monitoring the phase-difference between neighbouring subband signals, before the 
phase-multiplier is activated, it is easy to detect the channels that contain phase-inverted information. Considering 
tonal signals, the phase-difference is approximately xflM, according to Eq. 25, for non-inverted signals, and 
consequently approximately *(1-1/2A*) for signals, where either of the signals is inverted The detection of inverted 
signals may be accomplished simply by computing the dot product of samples in adjacent subbands as 

v< A ° <*') o v<^ (*') = real{v< A/ >(/,')}real{v^ | > („•)> + imag{v^> (/i')}imag{v<^> (*')} . (32) 

If the product in Eq. 32 is negative, the phase-difference is greater than 90°, and a phase-inversion condition is 
present The phase-angles of the complex-valued subband signals are multiplied by A/, according to the scheme of 
step 5, and finally, the inversion-tagged signals are negated. The relative phase locking method thus forces the 180° 
shifted subband signals to retain this shift after the phase-multiplication, and hence maintain the aliasing 
cancellation properties. 

Spectral envelope adjustment 

Most sounds, like speech and music, are characterised as products of slowly varying envelopes and rapidly varying 
carriers with constant amplitude, as described by Stockham ["The Application of Generalized Linearity to 
Automatic Gain Control" T.G. Stockham, Jr, IEEE Trans, on Audio and Electroacoustics, Vol. AU-16, No. 2, June 
1968] and Eq. 1. 



In split-band perceptual audio coders, the audio signal is segmented into frames and split into multiple frequency 
bands using subband filters or a time-to-frequency domain transform. In most codec types, the signal is 
subsequently separated into two major signal components for transmission or storage, the spectral envelope 
representation and the normalised subband samples or coefficients. Throughout the following description, the term 
"subband samples" or "coefficients" refers to sample values obtained from subband filters as well as coefficients 
obtained from a time-to-frequency transform. The term "spectral envelope" or "scale factors" represent values of the 
subbands on a time-frame basis, such as the average or maximum magnitude in each subband, used for 
normalisation of the subband samples. However, the spectral envelope may also be obtained using linear prediction 
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LPC, [U.S. PaL 5,684,920). In a typical codec, die normalised subtend samples require coding at a high bitrale 
(using approximately 90% of the available titrate), compared to the slowly varying temporal envelopes, and thus the 
spectral envelopes, that may be coded at a much-reduced rate (using approximately 10% of the available bitrate). 

Accurate spectral envelope of the replicated bandwidth is important if the timbral qualities of the original signal are 
to be preserved. The perceived timbre of a musical instrument, or voice, is mainly determined by the spectral 
d*tiibuuon below afiequency^ located in to^oaMcti^T*^^^^^ 
of less importance, and consequently the highband fine structures obtained by the above transposition methods 
require no adjustment, while the coarse structures generally do. In order to enable such adjustment, ft is useful to 
filter the spectral representation of the signal to separate the envelope coarse structure from the fine structure. 

In the SBR-1 implementation according to the present invention, the highband coarse spectral envelope is estimated 
from the lowband information available at the decoder. This estimation is performed by continuously monitoring the 
envelope of the lowband and adjusting the highband spectra, envelope according to specific rules. A novel method 
to accomplish the envelope estimation uses asymptotes in a logarithmic frequency-magnitude space, which is 
e^valent to curve fitting wuh polynomials of varyxng order in the linear space. The level and slope of an upper 
poruon of the lowband spectrum are estimated, and the estimates are used to define the level and slope of one or 
several segments representing the new highband envelope. The asymptote intersections are fixed in frequency and 
act as pivot point, However not always necessary, it is benefida. to stipulate consults to keep the highband 
envelope excursions within realistic boundaries. An alternative approach to estimation of the spectral envelope is to 
u« ve^or qua^or, VQ, of a large number of representative spectra, envelopes, and store these ura.ootTp- 
fcble or codebook. Vector quantization b***-***^**^^*,^^^^^ 
trauung data, m tins case audio spectra, envelope, The training is usually done with tire Generalised Lloyd 
Algorithm r Vector Quantization and Signal Compression" A. Gersho, R. M. Gray, Kluwer Academic Publishers 
USA 1992, ISBN 0-7923-9181-0], and yields vectors that optimally cover the contents of the training data 
Considering a VQ codebook consisting of A spectral envelopes trained by B envelopes (B » A) then the A 
envelopes represent the A most likely transitions from the lowband envelope to the highband envelope, based on B 
observations of a wide variety of sound, This is, theoretically, the A optimum rules for predicting the envelope 
based on the B observation, When estimating a new highband spectral envelope, the original lowband enve^ is 
u^^mecodebookandtitemghbandpartofthe 
highband spectrum. 



lo»W saw 240 , „ m ^ osea lm ^ ^ ^ 

mamtauung a significant bit rate reduction. 
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In some codecs, it is possible to transmit the scalefaclors for the entire spectral envelope while omitting the 
highband subband samples, as shown in Fig. 24. Other codec standards stipulate that scalefaclors and subband 
samples must cover the same frequency range, i.e. scale-factors cannot be transmitted if the subband samples are 
omitted. In such cases, there are several solutions; the highband spectral envelope information can be transmitted in 
separate frames, where the frames have their own headers and optional error protection, followed by the data 
Regular decoders, not taking advantage of the present invention, will not recognise the headers and therefore discard 
the extra frames. In a second solution, the highband spectral envelope information is transmitted as auxiliary data 
within the encoded bitstream. However, the available auxiliary data field must be large enough to hold the envelope 
informatioa In cases where none of the first two solutions are adaptable, a third solution, where the highband 
spectral envelope information is hidden as subband samples, may be applied. Subband scalefactors cover a large 
dynamic range, typically exceeding 100 dB. It is thus possible to set an arbitrary number of subband scalefactors, 
2505 in Fig. 25, to very low values, and to transmit the highband scalefactors -camouflaged" as subband samples, 
2501. This way of transmitting the highband scale actors to the decoder 2503 ensures compatibility with the 
bitstream syntax Hence, arbitrary data may be transmitted in this fashion. A related method exists where 
information is coded into the subband sample stream [U. S. Pat 5.687.191J. A fourth solution. Fig. 26, can be 
applied when a coding system uses Huffman- or other redundancy coding 2603. The subband samples for the 
highband is then set to zero 2601 or a constant value as to achieve a high redundancy. 

Transien t response improvements 

Transient related artifacts are common problems in audio codecs, and similar artifacts occur in the present invention. 
In general, patching generates spectral "zeros" or notches, corresponding to time domain pre- and post-echoes, Le 
spurious transients before and after "true" transients. Albeit the P-blocks "fill in the zeros" for slowly varying tonal 
signals, the pre- and post-echoes remain. The improved multiband method is intended to work on discrete sinusoids 
where the number of sinusoids is restricted to one per subband. Transients or noise in a subband can be viewed as a 
large number of discrete sinusoids within that subband. This generates intermodulation distortion. These artifacts are 
considered as additional quantization-noise sources connected to the replicated highband channels during transient 
intervals. Traditional methods to avoid pre- and post-echo artifacts in perceptual audio coders, for example adaptive 
window switching, may hence be used to enhance the subjective quality of the improved multiband method By 
using the transient detection provided by the codec or a separate detector and reducing the number of channels under 
transient conditions the "quantization noise" is forced not to exceed the time-dependent masking threshold. A 
smaller number of channels is used during transient passages whereas a larger is used during tonal passages Such 
adaptive window switching is commonly used in codecs in order to trade frequency resolution for time resolution. 
Different methods may be used in applications where the filterbank size is fixed. One approach is to shape the 
"quantization noise" in time via linear prediction in the spectral domain. The transposition is then performed on the 
residual signal, which is the output of the linear prediction filter. Subsequently, an inverse prediction filter is applied 
to the original- and spectral replicated channels simultaneously. Another approach employs a compander system i e 
dynamic amplitude compression of the transient signal prior to transposition or coding, and a complementary 
expansion after transposition. It is also possible to switch between transposition methods in a signal dependent 
manner, for example, a high resolution filterbank transposition method is used for stationary signals, and a time- 
vanant pattern search prediction method is employed for transient signals. 
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Practical implementations 

Using a standard signal-processor or a powerful PC, real-time operation of a SBR-enhanced codec is possible. The 
SBR enhanced codec may also be hard-coded on a custom chip. It may also be implemented in various kinds of 
systems for storage or transmission of signals, analogue or digital, using arbitrary codecs, Fig. 27 and Fig. 28. The 
5 SBR-1 method may be integrated in a decoder or supplied as an add-on hardware or software post-processing 

module. The SBR-2 method needs additional modification of the encoder. In Fig. 27 the analogue input signal is fed 
to the A/D-converter 2701, forming a digital signal which is fed to the an arbitrary encoder 2703, where source 
coding is performed. The signal fed into the system may be of such a low-pass type that spectral bands within the 
auditory range already have been discarded, or spectral bands are discarded in the arbitrary encoder. The resulting 
10 lowband signals are fed to the multiplexer 2705, forming a serial bitstreain which is transmitted or stored 2707. Hie 
de-multiplexer 2709 restores the signals and feeds them to an arbitrary decoder 271 1. Hie spectral envelope 
information 2715 is estimated at the decoder 27 13 and fed to the SBR-1 unit 27 1 3 which transposes the lowband 
signal to a highband signal and creates an envelope adjusted wideband signal. Finally, the digital wideband signal is 
converted 2717 to an analogue output signal. 

15 

The SBR-2 method needs additional modification of the encoder. In Fig. 28 the analogue input signal is fed to the 
A/D-converter 2801, forming a digital signal which is fed to die an arbitrary encoder 2803, where source coding is 
performed. The spectral envelope information is extracted 2805. The resulting signals, lowband subband samples or 
coefficients and wideband envelope information, are fed to the multiplexer 2807, forming a serial bitstream which is 
20 transmitted or stored 2809. The de-multiplexer 28 1 1 restores the signals, lowband subband samples or coefficients 
and wideband envelope information, and feeds them to an arbitrary decoder 2815. The spectral envelope 
information 2813 is fed from the de-multiplexer 281 1 to the SBR-2 unit 2817 which transposes the lowband signal 
to a highband signal and creates an envelope adjusted wideband signal. Finally, the digital wideband signal is 
converted 2819 to an analogue output signal. 
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When only very low bitrates are available, (Internet and slow telephone modems, AM-broadcasting etc.) mono 
coding of the audio program material is unavoidable. In order to improve the perceived quality and make the 
programme more pleasant sounding, a simple "pseudo-stereo" generator, Fig. 29, is obtained by the introduction of a 
tapped delayline 290 L This may feed 10ms and 15ms delayed signals at approximately -6dB 2903 to each output 
30 channel in addition to the original mono signal 2905. The pseudo-stereo generator offers a valuable perceptual 
improvement at a low computational cost 



The above-described embodiments are merely illustrative for the principles of the present invention for audio source 
35 coding improvement It is understood that modifications and variations of the arrangements and the details described 
herein will be apparent to others skilled in the art It is the intent, therefore, to be limited only by the scope of the 
impending, patent claims and not by the specific details presented by way of description and explanation of the 
embodiments herein. 



